Proposal by Anastasiya Ivanova for Framework for building semantic networks

Proposed by Anastasiya Ivanova (profile, biography) Don't forget to submit this proposal to official Google Melange site too!


Why I'd like to execute on this particular project?

- Data and knowledge management is an interesting, actual and urgent problem, I had been interested in for some time
- Smalltalk is a modern powerful language/technology with dynamic and open environment, offering a lot of benefits in general and for particular project (see below)
- The project addresses a real-life, needed problem
- The supposed result is a close-to-release application to be actually used and developing

Why I would be a fine choice to do the project?

-knowledge of graph theory (that is basis of semantic networks)

-I wish to realize that project and
continue learning Smalltalk and if I can expand its libraries.

Details of my non-Smalltalk experiences:

- I've got both theoretical and practical experience in programming and IT during my ... years of study in Tver State University
- My programming background includes both "traditional" languages and technologies like C, C++, C#, Java, HTML, XML, CSS etc. and some "a bit unusual" things like Smalltalk
- My practical experience includes real-life projects in the field of geophysical applications (for about 2 years already), data acquisition and processing along with several smaller "academical" projects (information systems, web-development, etc.)
- I've met the problem of data- and knowledge management in my practice for already several times, and got highly interested in it, having some ideas I've discussed with the project's mentor (and they were included in proposal)

Contact information:

Mentor Dennis Schetinin
Dennis.Schetinin@gmail.com
Student Ivanova Anastasiya
IvanovaAnastasiua@yandex.ru
Skype name: li4inka_kote

How the results will look like?

- The application is meant for structuring incoming data and turning it into knowledge
- It should accept data from different sources (like, for instance, e-mail, instant messages, blogs (over RSS) and micro-blogs (over relevant API, like Twitter), user input etc.)
- The incoming messages should be aggregated into personal knowledge base represented by semantic network
- The aggregation can be performed both manually by user actions, and automatically by processing scripts
- The application will be created as a web-service having both programming and user interfaces

Features:

- Alternative approach to data engineering and knowledge management based on semantic relations, instead of common today text analysis
- Our approach allows users to take part in the process of data analysis, "knowledge extraction" and management. Thereby users can build and customize knowledge base for their own needs and tasks.

The supposed architecture will include:

- Input subsystem (for accepting incoming data from different sources and services)
- Knowledge base (representing accepted objects and semantic relations between them)
- Knowledge browser/searcher (allowing user to find and use the useful information in semantic network)

The issue I think I can bypass in this project is data persistence. Smalltalk has "free" image persistence and therefore I don't have to deal with the problem from the first steps of project. Later on as the software getting mature it will be possible to add some persistence engine (like object-relational mapping, or object-oriented database, or modern document-oriented storage) as Smalltalk has a bunch of available solutions here.

Advantages over Google Desktop (or other analogs)

- User afford to manage input information and content with meaning raw data
- Presence of semantics - meanind between elements of knowledge base
- Single processing of heterogeneous data and service managing centre
- In perspective - collaboratin at the expense of individual semantic networks

Which frameworks/already made solutions could we consider to use in the project

- VisualWorks as a programming and execution environment
- Seaside for web-application development
- Smalltalk frameworks and class libraries for Twitter API, e-mail processing, web-services interaction etc. (available both in VisualWorks distribution and at Cincom Open Repository)

What methodologies will I use?

Iterative development, based on Agile-principle particularly Test-Driven Development
Should I say, agile methodology and practices are really well-supported by Smalltalk due to its dynamic nature and open architecture.

Where I see the risks

- Usability: it's a big challenge to create an easy-to-use and efficient user interface for managing big volumes of data
- Performance: dealing with large graphs representing semantic relations between heterogeneous objects could be slow and resource-intensive task

Suggested timeline and milestones,

How are we going to manage time and tasks:


Week 1: Establishing stable collaboration with mentor, tunning development environment, detailed planning for the 1-st iteration.

Weeks 2 and 3: Implementing "manual input" subsystem (e.g. a simple "ToDo" application) and basic engine for establishing semantic relations between items (tasks, tags, projects, timestamps, alerts...) .

Weeks 4 and 6: Subsystem for accepting data from external services, e.g. Twitter.
 
Weeks 7, 8, 9 and 10: Researching approaches to build a good user interface for incoming data processing and knowledge-base browsing.
 
About 2 weeks are reserved for solving possible unforseen problems, acceptance testing and debugging, documentin




Updated: 8.4.2010